Path: blob/master/Part 2 - Regression/Multiple Linear Regression/[R] Multiple Linear Regression.ipynb
1009 views
Kernel: R
Multiple Linear Regression
In [1]:
In [2]:
Out[2]:
In [3]:
Out[3]:
Green = Dependent variable
Blue = Independent variable
In [4]:
Out[4]:
Data Preprocessing
In [6]:
In [7]:
Out[7]:
Encoding categorical data
In [8]:
In [9]:
Out[9]:
In [11]:
In [12]:
Out[12]:
In [13]:
Out[13]:
Fitting Multiple Linear Regression to the training set
In [24]:
In [25]:
Out[25]:
Call:
lm(formula = Profit ~ ., data = training_set)
Residuals:
Min 1Q Median 3Q Max
-33128 -4865 5 6098 18065
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.965e+04 7.637e+03 6.501 1.94e-07 ***
R.D.Spend 7.986e-01 5.604e-02 14.251 6.70e-16 ***
Administration -2.942e-02 5.828e-02 -0.505 0.617
Marketing.Spend 3.268e-02 2.127e-02 1.537 0.134
State2 1.213e+02 3.751e+03 0.032 0.974
State3 2.376e+02 4.127e+03 0.058 0.954
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9908 on 34 degrees of freedom
Multiple R-squared: 0.9499, Adjusted R-squared: 0.9425
F-statistic: 129 on 5 and 34 DF, p-value: < 2.2e-16
Predicting the Test set result
In [26]:
In [27]:
Out[27]:
In [28]:
Out[28]:
Fitting Multiple Linear Regression to the training set using only R & D Spend
In [30]:
Out[30]:
Building the optimal model using Backward Elimination
In [33]:
Out[33]:
Call:
lm(formula = Profit ~ R.D.Spend + Administration + Marketing.Spend +
State, data = training_set)
Residuals:
Min 1Q Median 3Q Max
-33128 -4865 5 6098 18065
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.965e+04 7.637e+03 6.501 1.94e-07 ***
R.D.Spend 7.986e-01 5.604e-02 14.251 6.70e-16 ***
Administration -2.942e-02 5.828e-02 -0.505 0.617
Marketing.Spend 3.268e-02 2.127e-02 1.537 0.134
State2 1.213e+02 3.751e+03 0.032 0.974
State3 2.376e+02 4.127e+03 0.058 0.954
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9908 on 34 degrees of freedom
Multiple R-squared: 0.9499, Adjusted R-squared: 0.9425
F-statistic: 129 on 5 and 34 DF, p-value: < 2.2e-16
In [34]:
Out[34]:
Call:
lm(formula = Profit ~ R.D.Spend + Administration + Marketing.Spend,
data = training_set)
Residuals:
Min 1Q Median 3Q Max
-33117 -4858 -36 6020 17957
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.970e+04 7.120e+03 6.980 3.48e-08 ***
R.D.Spend 7.983e-01 5.356e-02 14.905 < 2e-16 ***
Administration -2.895e-02 5.603e-02 -0.517 0.609
Marketing.Spend 3.283e-02 1.987e-02 1.652 0.107
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9629 on 36 degrees of freedom
Multiple R-squared: 0.9499, Adjusted R-squared: 0.9457
F-statistic: 227.6 on 3 and 36 DF, p-value: < 2.2e-16
In [35]:
Out[35]:
Call:
lm(formula = Profit ~ R.D.Spend + Marketing.Spend, data = training_set)
Residuals:
Min 1Q Median 3Q Max
-33294 -4763 -354 6351 17693
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.638e+04 3.019e+03 15.364 <2e-16 ***
R.D.Spend 7.879e-01 4.916e-02 16.026 <2e-16 ***
Marketing.Spend 3.538e-02 1.905e-02 1.857 0.0713 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9533 on 37 degrees of freedom
Multiple R-squared: 0.9495, Adjusted R-squared: 0.9468
F-statistic: 348.1 on 2 and 37 DF, p-value: < 2.2e-16
In [36]:
Out[36]:
Call:
lm(formula = Profit ~ R.D.Spend, data = training_set)
Residuals:
Min 1Q Median 3Q Max
-34334 -4894 -340 6752 17147
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.902e+04 2.748e+03 17.84 <2e-16 ***
R.D.Spend 8.563e-01 3.357e-02 25.51 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9836 on 38 degrees of freedom
Multiple R-squared: 0.9448, Adjusted R-squared: 0.9434
F-statistic: 650.8 on 1 and 38 DF, p-value: < 2.2e-16